717 research outputs found
Learning mixtures of separated nonspherical Gaussians
Mixtures of Gaussian (or normal) distributions arise in a variety of
application areas. Many heuristics have been proposed for the task of finding
the component Gaussians given samples from the mixture, such as the EM
algorithm, a local-search heuristic from Dempster, Laird and Rubin [J. Roy.
Statist. Soc. Ser. B 39 (1977) 1-38]. These do not provably run in polynomial
time. We present the first algorithm that provably learns the component
Gaussians in time that is polynomial in the dimension. The Gaussians may have
arbitrary shape, but they must satisfy a ``separation condition'' which places
a lower bound on the distance between the centers of any two component
Gaussians. The mathematical results at the heart of our proof are ``distance
concentration'' results--proved using isoperimetric inequalities--which
establish bounds on the probability distribution of the distance between a pair
of points generated according to the mixture. We also formalize the more
general problem of max-likelihood fit of a Gaussian mixture to unstructured
data.Comment: Published at http://dx.doi.org/10.1214/105051604000000512 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Conventional wisdom in deep learning states that increasing depth improves
expressiveness but complicates optimization. This paper suggests that,
sometimes, increasing depth can speed up optimization. The effect of depth on
optimization is decoupled from expressiveness by focusing on settings where
additional layers amount to overparameterization - linear neural networks, a
well-studied model. Theoretical analysis, as well as experiments, show that
here depth acts as a preconditioner which may accelerate convergence. Even on
simple convex problems such as linear regression with loss, ,
gradient descent can benefit from transitioning to a non-convex
overparameterized objective, more than it would from some common acceleration
schemes. We also prove that it is mathematically impossible to obtain the
acceleration effect of overparametrization via gradients of any regularizer.Comment: Published at the International Conference on Machine Learning (ICML)
201
Learning Topic Models - Going beyond SVD
Topic Modeling is an approach used for automatic comprehension and
classification of data in a variety of settings, and perhaps the canonical
application is in uncovering thematic structure in a corpus of documents. A
number of foundational works both in machine learning and in theory have
suggested a probabilistic model for documents, whereby documents arise as a
convex combination of (i.e. distribution on) a small number of topic vectors,
each topic vector being a distribution on words (i.e. a vector of
word-frequencies). Similar models have since been used in a variety of
application areas; the Latent Dirichlet Allocation or LDA model of Blei et al.
is especially popular.
Theoretical studies of topic modeling focus on learning the model's
parameters assuming the data is actually generated from it. Existing approaches
for the most part rely on Singular Value Decomposition(SVD), and consequently
have one of two limitations: these works need to either assume that each
document contains only one topic, or else can only recover the span of the
topic vectors instead of the topic vectors themselves.
This paper formally justifies Nonnegative Matrix Factorization(NMF) as a main
tool in this context, which is an analog of SVD where all vectors are
nonnegative. Using this tool we give the first polynomial-time algorithm for
learning topic models without the above two limitations. The algorithm uses a
fairly mild assumption about the underlying topic matrix called separability,
which is usually found to hold in real-life data. A compelling feature of our
algorithm is that it generalizes to models that incorporate topic-topic
correlations, such as the Correlated Topic Model and the Pachinko Allocation
Model.
We hope that this paper will motivate further theoretical results that use
NMF as a replacement for SVD - just as NMF has come to replace SVD in many
applications
Towards a better approximation for sparsest cut?
We give a new -approximation for sparsest cut problem on graphs
where small sets expand significantly more than the sparsest cut (sets of size
expand by a factor bigger, for some small ; this
condition holds for many natural graph families). We give two different
algorithms. One involves Guruswami-Sinop rounding on the level- Lasserre
relaxation. The other is combinatorial and involves a new notion called {\em
Small Set Expander Flows} (inspired by the {\em expander flows} of ARV) which
we show exists in the input graph. Both algorithms run in time . We also show similar approximation algorithms in graphs with
genus with an analogous local expansion condition. This is the first
algorithm we know of that achieves -approximation on such general
family of graphs
- …